Sampling for Approximate Reduct in very Large Datasets

نویسندگان

  • Keyun Hu
  • Lili Diao
  • Yuchang Lu
  • Chunyi Shi
چکیده

The rough set theory provides a formal framework for data mining. Reduct is the most important concept in rough set application to data mining. A reduct is the minimal attribute set preserving classification power of original dataset. Finding a reduct is similar to feature selection problem. In this paper, we propose two reduct algorithms. One is based on attribute frequency in discernibility matrix. Another uses similar idea and sampling techniques for large datasets. Empirical analysis shows that both algorithms are efficient.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fast Uncertainty Sampling for Labeling Large E-mail Corpora

One of the biggest challenges in building effective anti-spam solutions is designing systems to defend against the ever-evolving bag of tricks spammers use to defeat them. Because of this, spam filters that work well today may not work well tomorrow. The adversarial nature of the spam problem makes large, up-to-date, and diverse e-mail corpora critical for the development and evaluation of new ...

متن کامل

Hunches and Sketches: rapid interactive exploration of large datasets through approximate visualisations

Information visualisation presents powerful techniques for data analytics. However, rendering visualisations of big datasets is impractical on commodity hardware. There is increasing interest in approaches where data sampling and probabilistic algorithms are used to support faster processing of large datasets. This approach to approximate computation has not yet paid close attention to the way ...

متن کامل

An interactive framework for spatial joins: a statistical approach to data analysis in GIS

Many Geographic Information Systems (GIS) handle a large volume of geospatial data. Spatial joins over two or more geospatial datasets are very common operations in GIS for data analysis and decision support. However, evaluating spatial joins can be very time intensive due to the size of datasets. In this paper, we propose an interactive framework that provides faster approximate answers of spa...

متن کامل

Feature ranking in rough sets

We propose a novel feature ranking technique using discernibility matrix. Discernibility matrix is used in rough set theory for reduct computation. By making use of attribute frequency information in discernibility matrix, we develop a fast feature ranking mechanism. Based on the mechanism, two heuristic reduct computation algorithms are proposed. One is for optimal reduct and the other for app...

متن کامل

Ensembles of Classifiers Based on Approximate Reducts

The problem of improving rough set based expert systems by modifying a notion of reduct is discussed. The notion of approximate reduct is introduced, as well as some proposals of quality measure for such a reduct. The complete classifying system based on approximate reducts is presented and discussed. It is proved that the problem of finding optimal set of classifying agents based on approximat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000